Los modelos mixtos son una herramienta de modelado extremadamente útil para situaciones en las que existe cierta dependencia entre las observaciones de los datos, donde la correlación surge típicamente de que las observaciones están agrupadas de alguna manera. Por ejemplo, es bastante común tener datos en los que tenemos mediciones repetidas para las unidades de observación, o en los que las unidades de observación están agrupadas de alguna otra manera (por ejemplo, estudiantes dentro de la escuela, ciudades dentro de una región geográfica). Si bien existen diferentes formas de abordar una situación de este tipo, los modelos mixtos son una herramienta muy común y poderosa para usar. Además, tienen vínculos con otros enfoques estadísticos que amplían aún más su aplicabilidad.
El objetivo principal de este documento es proporcionar una idea de cuándo se pueden utilizar modelos mixtos y entregar una variedad de técnicas estándar para implementarlos.
Los modelos mixtos han existido durante mucho tiempo en el ámbito estadístico. Por ejemplo, los métodos ANOVA estándar pueden considerarse casos especiales de un modelo mixto. Más recientemente, los modelos mixtos tienen una variedad de aplicaciones y extensiones, lo que les permite abarcar una amplia gama de situaciones de datos. Pueden considerarse un primer paso para ampliar el conjunto de herramientas más allá del modelo lineal generalizado.
Terminología
Para los no iniciados, la terminología que rodea a los modelos mixtos, especialmente entre disciplinas, puede resultar un poco confusa. Algunos términos con los que puede encontrarse en relación con este tipo de modelos son:
Modelos de efectos mixtos
Todos describen tipos de modelos mixtos. Algunos términos pueden ser más históricos, otros se ven con más frecuencia en una disciplina específica, otros pueden referirse a una determinada estructura de datos y otros son casos especiales.
Los modelos de efectos mixtos, o simplemente mixtos, generalmente se refieren a una mezcla de efectos fijos y aleatorios.
Para los modelos en general, prefiero los términos “modelos mixtos” o “modelos de efectos aleatorios” porque son términos más simples, no se implica ninguna estructura específica y estos últimos también pueden aplicarse a extensiones en las que muchos no pensarían cuando se usan otros términos. En cuanto a los efectos mixtos, el término “efectos fijos” es quizás un término pobre, pero no por ello menos adecuado, para los efectos principales típicos que se pueden observar en un modelo de regresión lineal, es decir, la parte no aleatoria de un modelo mixto. En algunos contextos, se los denomina “efecto promedio de la población”. Aunque se oirán muchas definiciones, los efectos aleatorios son simplemente aquellos específicos de una unidad de observación, cualquiera sea su definición. El enfoque descrito en este documento se refiere en gran medida al caso en el que la unidad de observación es el nivel de algún factor de agrupamiento, pero esta es solo una de varias posibilidades.
Tipos de estructura
En términos de estructura, los datos pueden tener una o varias fuentes de agrupamiento, y ese agrupamiento puede ser jerárquico, de modo que los agrupamientos estén anidados dentro de otros agrupamientos. Un ejemplo serían las pruebas de aptitud escolar administradas varias veces a los estudiantes (observaciones repetidas anidadas dentro de los estudiantes, estudiantes anidados dentro de las escuelas, escuelas anidadas dentro de los distritos). En otros casos, no hay una estructura de anidamiento. Un ejemplo sería un experimento de tiempo de reacción donde los participantes realizan el mismo conjunto de tareas. Si bien las observaciones están anidadas dentro de un individuo, las observaciones también están agrupadas según el tipo de tarea. Algunos usan los términos anidado y cruzado para distinguir entre estos escenarios. Además, el agrupamiento puede ser equilibrado o no. Podríamos esperar más equilibrio en estudios de naturaleza experimental, pero definitivamente no en otros casos, por ejemplo, cuando el agrupamiento es algo así como una unidad geográfica y las observaciones son personas.
En lo que sigue, veremos modelos de efectos mixtos en todas estas situaciones de datos. En general, nuestro enfoque será el mismo, ya que dicha agrupación es en realidad más una propiedad de los datos que del modelo. Sin embargo, es importante tener una idea de la flexibilidad de los modelos mixtos para manejar una variedad de situaciones de datos. También vale la pena señalar que pueden surgir otras estructuras, como temporales o espaciales. Las mencionaremos en algunos lugares de este documento, pero no serán el foco.
Modelo de intersecciones aleatorias
A continuación demostraremos el método más simple y el caso más común de un modelo mixto, aquel en el que tenemos una única estructura de agrupamiento/cluster para el efecto aleatorio. Por razones que esperamos que se aclaren pronto, esto se denomina comúnmente modelo de interceptos aleatorios .
Ejemplo: GPA del estudiante
Para nuestro primer ejemplo, evaluaremos los factores que predicen el promedio de calificaciones (GPA) de la universidad. Cada uno de los 200 estudiantes es evaluado en seis ocasiones (cada semestre durante los primeros tres años), por lo que tenemos observaciones agrupadas dentro de los estudiantes. Tenemos otras variables, como la situación laboral, el sexo y el GPA de la escuela secundaria. Algunas estarán en forma tanto numérica como etiquetada.
load('data/gpa.RData')
datatable(gpa, options = list(
pageLength = 5,
autoWidth = TRUE,
initComplete = JS(
"function(settings, json) {",
"$('table.dataTable').css({'font-size': '12px'});",
"}"
)
))
El modelo de regresión estándar Ahora, el modelo subyacente. Podemos mostrarlo de un par de maneras diferentes. Primero, comenzamos con una regresión estándar para orientarnos.
Ahora, el modelo subyacente. Podemos mostrarlo de un par de maneras diferentes. Primero, comenzamos con una regresión estándar para orientarnos.
\[ \text{GPA} = \beta_{\text{intercepto}} + \beta_{\text{tiempo}} \cdot \text{tiempo} + \epsilon \]
Tenemos coeficientes (\(\beta\)) para la intersección y el efecto del tiempo. El error (\(\epsilon\)) se supone que se distribuye normalmente con media 0 y cierta desviación estándar \(\sigma\).
\[ \epsilon \sim \mathcal{N}(0, \sigma) \]
Una forma alternativa de escribir el modelo que pone énfasis en el proceso de generación de datos subyacente para GPA se puede demostrar de la siguiente manera.
\[ \text{GPA} \sim \mathcal{N}(\mu, \sigma) \]
\[ \mu = \beta_{\text{intercepto}} + \beta_{\text{tiempo}} \cdot \text{tiempo} \]
Más técnicamente, el GPA y \(\mu\) tienen un subíndice implícito para denotar cada observación, pero también puedes pensarlo como un modelo para un solo individuo en un solo punto temporal.
Ahora mostramos una forma de representar un modelo mixto que incluye un efecto único para cada estudiante. Considere el siguiente modelo para un solo estudiante. Esto demuestra que el efecto específico del estudiante, es decir, la desviación del GPA sólo para ese estudiante siendo quien es, puede verse como una fuente adicional de variación.
\[ \text{GPA} = \beta_{\text{intercepto}} + \beta_{\text{tiempo}} \cdot \text{tiempo} + (\text{efecto\_estudiante} + \epsilon) \]
Normalmente asumiríamos lo siguiente para los efectos de los estudiantes.
\[ \text{efecto\_estudiante} \sim \mathcal{N}(0, \tau) \]
Por lo tanto, los efectos de los estudiantes son aleatorios y, específicamente, se distribuyen normalmente con una media de cero y una desviación estándar estimada (\(\tau\)). En otras palabras, conceptualmente, la única diferencia entre este modelo mixto y una regresión estándar es el efecto del estudiante, que en promedio no tiene efecto, pero que típicamente varía de un estudiante a otro en una cantidad que en promedio es \(\tau\).
Si lo reorganizamos, podemos centrarnos en los coeficientes del modelo, en lugar de considerarlo una fuente adicional de error.
\[ \text{GPA} = (\beta_{\text{intercepto}} + \text{efecto\_estudiante}) + \beta_{\text{tiempo}} \cdot \text{tiempo} + \epsilon \]
O más sucintamente:
\[ \text{GPA} = \beta_{\text{intercepto\_estudiante}} + \beta_{\text{tiempo}} \cdot \text{tiempo} + \epsilon \]
De esta manera, tendremos intersecciones específicas para cada estudiante, ya que cada persona tendrá su propio efecto único agregado a la intercepción general, lo que dará como resultado una intercepción diferente para cada persona.
\[ \beta_{\text{intercepto\_estudiante}} \sim \mathcal{N}(\beta_{\text{intercepto}}, \tau) \]
Ahora vemos que los puntos de corte se distribuyen normalmente con una media del punto de corte general y una desviación estándar. Por ello, a esto se le suele llamar modelo de puntos de corte aleatorios.
En la literatura sobre modelos multinivel se suele ver otra forma de mostrar el modelo mixto. Se muestra de forma más explícita como un modelo de regresión de dos partes, una a nivel de observación y otra a nivel de estudiante.
\[ \text{GPA} = \beta_{\text{intercepto\_estudiante}} + \beta_{\text{tiempo}} \cdot \text{tiempo} + \epsilon \]
\[ \beta_{\text{intercepto\_estudiante}} = \beta_{\text{intercepto}} + \text{efecto\_estudiante} \]
Sin embargo, después de ‘conectar’ la parte del segundo nivel a la primera, es idéntica a la anterior.
Observe que no tenemos un efecto específico para cada estudiante en el caso de la ocasión. En este contexto, se dice que la ocasión es solo un efecto fijo y no hay ningún componente aleatorio. Sin embargo, esto definitivamente no tiene por qué ser así, como veremos más adelante.
Sin embargo, después de ‘conectar’ la parte del segundo nivel a la primera, es idéntica a la anterior.
Observe que no tenemos un efecto específico para cada estudiante en el caso de la ocasión. En este contexto, se dice que la ocasión es solo un efecto fijo y no hay ningún componente aleatorio. Sin embargo, esto definitivamente no tiene por qué ser así, como veremos más adelante.
Solicitud Visualización inicial Siempre es útil mirar antes de dar el salto, así que hagámoslo. Aquí graficamos el promedio de calificaciones en función de la ocasión (es decir, el semestre) para tener una idea de la variabilidad en los puntos de partida y las tendencias.
library(lme4)
## Cargando paquete requerido: Matrix
##
## Adjuntando el paquete: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Adjuntando el paquete: 'lme4'
## The following object is masked from 'package:expss':
##
## dummy
gpa_mixed = lmer(gpa ~ occasion + (1|student), data=gpa)
summary(gpa_mixed)
## Linear mixed model fit by REML ['lmerMod']
## Formula: gpa ~ occasion + (1 | student)
## Data: gpa
##
## REML criterion at convergence: 408.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6169 -0.6373 -0.0004 0.6361 2.8310
##
## Random effects:
## Groups Name Variance Std.Dev.
## student (Intercept) 0.06372 0.2524
## Residual 0.05809 0.2410
## Number of obs: 1200, groups: student, 200
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 2.599214 0.021696 119.8
## occasion 0.106314 0.004074 26.1
##
## Correlation of Fixed Effects:
## (Intr)
## occasion -0.469
[As a test, replace 1|student with
1|sample(1:10, 1200, replace = T). As your variance due to
arbitrary grouping is essentially 0, the residual error estimate is
similar to the lm model.]
People always ask where the p-values are, but the answer is…
complicated. Other packages and programs present them as if they are
trivially obtained, but that is not the case, and the lme4
developers would rather not make unnecessary assumptions. On the plus
side, you can get interval estimates easily enough, even though they are
poorly named for the variance components. sigma01 is the
student variance.
confint(gpa_mixed)
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 0.22517423 0.2824604
## .sigma 0.23071113 0.2518510
## (Intercept) 2.55665145 2.6417771
## occasion 0.09832589 0.1143027
Now examine the random effects.
ranef(gpa_mixed)$student
## (Intercept)
## 1 -0.070894923
## 2 -0.215578439
## 3 0.088256945
## 4 -0.186641736
## 5 0.030383538
## 6 -0.302388548
## 7 -0.143236681
## 8 0.218472109
## 9 0.102725296
## 10 0.015915187
## 11 0.363155625
## 12 0.001446835
## 13 0.522307493
## 14 0.117193648
## 15 -0.215578439
## 16 -0.128768329
## 17 -0.273451845
## 18 -0.302388548
## 19 -0.099831626
## 20 0.232940461
## 21 -0.302388548
## 22 0.117193648
## 23 0.073788593
## 24 0.001446835
## 25 0.392092328
## 26 -0.273451845
## 27 -0.374730306
## 28 0.117193648
## 29 0.102725296
## 30 0.218472109
## 31 -0.360261955
## 32 0.261877164
## 33 -0.128768329
## 34 -0.215578439
## 35 0.088256945
## 36 0.131661999
## 37 0.088256945
## 38 0.421029031
## 39 -0.056426571
## 40 0.276345515
## 41 0.522307493
## 42 -0.056426571
## 43 0.073788593
## 44 0.175067054
## 45 0.175067054
## 46 -0.287920197
## 47 -0.230046790
## 48 0.001446835
## 49 0.102725296
## 50 -0.070894923
## 51 -0.027489868
## 52 0.160598703
## 53 0.073788593
## 54 0.189535406
## 55 0.146130351
## 56 -0.258983494
## 57 0.073788593
## 58 -0.186641736
## 59 0.059320242
## 60 -0.157705032
## 61 0.261877164
## 62 -0.114299978
## 63 0.276345515
## 64 0.102725296
## 65 -0.244515142
## 66 0.204003757
## 67 -0.215578439
## 68 -0.273451845
## 69 -0.273451845
## 70 0.015915187
## 71 0.001446835
## 72 0.435497383
## 73 -0.215578439
## 74 0.218472109
## 75 0.030383538
## 76 0.160598703
## 77 0.478902438
## 78 -0.374730306
## 79 -0.476008767
## 80 0.204003757
## 81 0.044851890
## 82 0.305282219
## 83 0.247408812
## 84 -0.533882174
## 85 0.117193648
## 86 -0.186641736
## 87 0.001446835
## 88 -0.056426571
## 89 0.146130351
## 90 -0.056426571
## 91 -0.316856900
## 92 0.652522657
## 93 0.363155625
## 94 -0.056426571
## 95 -0.128768329
## 96 0.146130351
## 97 -0.056426571
## 98 0.189535406
## 99 -0.172173384
## 100 -0.447072064
## 101 0.334218922
## 102 -0.041958220
## 103 -0.143236681
## 104 0.088256945
## 105 0.768269470
## 106 -0.360261955
## 107 -0.345793603
## 108 -0.389198658
## 109 -0.070894923
## 110 -0.504945471
## 111 -0.432603713
## 112 0.218472109
## 113 -0.172173384
## 114 -0.041958220
## 115 -0.085363274
## 116 -0.128768329
## 117 -0.013021516
## 118 0.131661999
## 119 -0.215578439
## 120 0.117193648
## 121 0.088256945
## 122 0.247408812
## 123 -0.374730306
## 124 0.088256945
## 125 -0.201110087
## 126 -0.114299978
## 127 -0.345793603
## 128 -0.070894923
## 129 -0.070894923
## 130 0.131661999
## 131 0.305282219
## 132 0.160598703
## 133 -0.389198658
## 134 0.015915187
## 135 0.088256945
## 136 -0.172173384
## 137 -0.099831626
## 138 -0.070894923
## 139 0.204003757
## 140 -0.128768329
## 141 0.189535406
## 142 0.088256945
## 143 -0.041958220
## 144 0.088256945
## 145 -0.070894923
## 146 -0.287920197
## 147 0.392092328
## 148 -0.143236681
## 149 -0.056426571
## 150 -0.230046790
## 151 -0.186641736
## 152 -0.114299978
## 153 -0.027489868
## 154 0.218472109
## 155 -0.085363274
## 156 -0.085363274
## 157 0.334218922
## 158 -0.027489868
## 159 0.449965735
## 160 0.363155625
## 161 -0.099831626
## 162 -0.432603713
## 163 -0.041958220
## 164 -0.114299978
## 165 -0.013021516
## 166 0.189535406
## 167 0.204003757
## 168 0.261877164
## 169 0.261877164
## 170 0.392092328
## 171 -0.287920197
## 172 -0.041958220
## 173 0.261877164
## 174 0.117193648
## 175 0.088256945
## 176 -0.085363274
## 177 0.189535406
## 178 0.015915187
## 179 0.044851890
## 180 -0.230046790
## 181 -0.345793603
## 182 0.218472109
## 183 -0.258983494
## 184 0.015915187
## 185 0.088256945
## 186 -0.056426571
## 187 -0.360261955
## 188 0.146130351
## 189 0.204003757
## 190 -0.374730306
## 191 -0.027489868
## 192 -0.041958220
## 193 0.044851890
## 194 0.001446835
## 195 -0.403667009
## 196 -0.027489868
## 197 0.189535406
## 198 0.160598703
## 199 -0.186641736
## 200 0.348687273
coef(gpa_mixed)$student
## (Intercept) occasion
## 1 2.528319 0.1063143
## 2 2.383636 0.1063143
## 3 2.687471 0.1063143
## 4 2.412573 0.1063143
## 5 2.629598 0.1063143
## 6 2.296826 0.1063143
## 7 2.455978 0.1063143
## 8 2.817686 0.1063143
## 9 2.701940 0.1063143
## 10 2.615129 0.1063143
## 11 2.962370 0.1063143
## 12 2.600661 0.1063143
## 13 3.121522 0.1063143
## 14 2.716408 0.1063143
## 15 2.383636 0.1063143
## 16 2.470446 0.1063143
## 17 2.325762 0.1063143
## 18 2.296826 0.1063143
## 19 2.499383 0.1063143
## 20 2.832155 0.1063143
## 21 2.296826 0.1063143
## 22 2.716408 0.1063143
## 23 2.673003 0.1063143
## 24 2.600661 0.1063143
## 25 2.991307 0.1063143
## 26 2.325762 0.1063143
## 27 2.224484 0.1063143
## 28 2.716408 0.1063143
## 29 2.701940 0.1063143
## 30 2.817686 0.1063143
## 31 2.238952 0.1063143
## 32 2.861091 0.1063143
## 33 2.470446 0.1063143
## 34 2.383636 0.1063143
## 35 2.687471 0.1063143
## 36 2.730876 0.1063143
## 37 2.687471 0.1063143
## 38 3.020243 0.1063143
## 39 2.542788 0.1063143
## 40 2.875560 0.1063143
## 41 3.121522 0.1063143
## 42 2.542788 0.1063143
## 43 2.673003 0.1063143
## 44 2.774281 0.1063143
## 45 2.774281 0.1063143
## 46 2.311294 0.1063143
## 47 2.369167 0.1063143
## 48 2.600661 0.1063143
## 49 2.701940 0.1063143
## 50 2.528319 0.1063143
## 51 2.571724 0.1063143
## 52 2.759813 0.1063143
## 53 2.673003 0.1063143
## 54 2.788750 0.1063143
## 55 2.745345 0.1063143
## 56 2.340231 0.1063143
## 57 2.673003 0.1063143
## 58 2.412573 0.1063143
## 59 2.658535 0.1063143
## 60 2.441509 0.1063143
## 61 2.861091 0.1063143
## 62 2.484914 0.1063143
## 63 2.875560 0.1063143
## 64 2.701940 0.1063143
## 65 2.354699 0.1063143
## 66 2.803218 0.1063143
## 67 2.383636 0.1063143
## 68 2.325762 0.1063143
## 69 2.325762 0.1063143
## 70 2.615129 0.1063143
## 71 2.600661 0.1063143
## 72 3.034712 0.1063143
## 73 2.383636 0.1063143
## 74 2.817686 0.1063143
## 75 2.629598 0.1063143
## 76 2.759813 0.1063143
## 77 3.078117 0.1063143
## 78 2.224484 0.1063143
## 79 2.123206 0.1063143
## 80 2.803218 0.1063143
## 81 2.644066 0.1063143
## 82 2.904497 0.1063143
## 83 2.846623 0.1063143
## 84 2.065332 0.1063143
## 85 2.716408 0.1063143
## 86 2.412573 0.1063143
## 87 2.600661 0.1063143
## 88 2.542788 0.1063143
## 89 2.745345 0.1063143
## 90 2.542788 0.1063143
## 91 2.282357 0.1063143
## 92 3.251737 0.1063143
## 93 2.962370 0.1063143
## 94 2.542788 0.1063143
## 95 2.470446 0.1063143
## 96 2.745345 0.1063143
## 97 2.542788 0.1063143
## 98 2.788750 0.1063143
## 99 2.427041 0.1063143
## 100 2.152142 0.1063143
## 101 2.933433 0.1063143
## 102 2.557256 0.1063143
## 103 2.455978 0.1063143
## 104 2.687471 0.1063143
## 105 3.367484 0.1063143
## 106 2.238952 0.1063143
## 107 2.253421 0.1063143
## 108 2.210016 0.1063143
## 109 2.528319 0.1063143
## 110 2.094269 0.1063143
## 111 2.166611 0.1063143
## 112 2.817686 0.1063143
## 113 2.427041 0.1063143
## 114 2.557256 0.1063143
## 115 2.513851 0.1063143
## 116 2.470446 0.1063143
## 117 2.586193 0.1063143
## 118 2.730876 0.1063143
## 119 2.383636 0.1063143
## 120 2.716408 0.1063143
## 121 2.687471 0.1063143
## 122 2.846623 0.1063143
## 123 2.224484 0.1063143
## 124 2.687471 0.1063143
## 125 2.398104 0.1063143
## 126 2.484914 0.1063143
## 127 2.253421 0.1063143
## 128 2.528319 0.1063143
## 129 2.528319 0.1063143
## 130 2.730876 0.1063143
## 131 2.904497 0.1063143
## 132 2.759813 0.1063143
## 133 2.210016 0.1063143
## 134 2.615129 0.1063143
## 135 2.687471 0.1063143
## 136 2.427041 0.1063143
## 137 2.499383 0.1063143
## 138 2.528319 0.1063143
## 139 2.803218 0.1063143
## 140 2.470446 0.1063143
## 141 2.788750 0.1063143
## 142 2.687471 0.1063143
## 143 2.557256 0.1063143
## 144 2.687471 0.1063143
## 145 2.528319 0.1063143
## 146 2.311294 0.1063143
## 147 2.991307 0.1063143
## 148 2.455978 0.1063143
## 149 2.542788 0.1063143
## 150 2.369167 0.1063143
## 151 2.412573 0.1063143
## 152 2.484914 0.1063143
## 153 2.571724 0.1063143
## 154 2.817686 0.1063143
## 155 2.513851 0.1063143
## 156 2.513851 0.1063143
## 157 2.933433 0.1063143
## 158 2.571724 0.1063143
## 159 3.049180 0.1063143
## 160 2.962370 0.1063143
## 161 2.499383 0.1063143
## 162 2.166611 0.1063143
## 163 2.557256 0.1063143
## 164 2.484914 0.1063143
## 165 2.586193 0.1063143
## 166 2.788750 0.1063143
## 167 2.803218 0.1063143
## 168 2.861091 0.1063143
## 169 2.861091 0.1063143
## 170 2.991307 0.1063143
## 171 2.311294 0.1063143
## 172 2.557256 0.1063143
## 173 2.861091 0.1063143
## 174 2.716408 0.1063143
## 175 2.687471 0.1063143
## 176 2.513851 0.1063143
## 177 2.788750 0.1063143
## 178 2.615129 0.1063143
## 179 2.644066 0.1063143
## 180 2.369167 0.1063143
## 181 2.253421 0.1063143
## 182 2.817686 0.1063143
## 183 2.340231 0.1063143
## 184 2.615129 0.1063143
## 185 2.687471 0.1063143
## 186 2.542788 0.1063143
## 187 2.238952 0.1063143
## 188 2.745345 0.1063143
## 189 2.803218 0.1063143
## 190 2.224484 0.1063143
## 191 2.571724 0.1063143
## 192 2.557256 0.1063143
## 193 2.644066 0.1063143
## 194 2.600661 0.1063143
## 195 2.195547 0.1063143
## 196 2.571724 0.1063143
## 197 2.788750 0.1063143
## 198 2.759813 0.1063143
## 199 2.412573 0.1063143
## 200 2.947902 0.1063143
As we didn’t allow the occasion effect to vary, it is constant. We’ll change this later.
predict(gpa_mixed, re.form=NA) %>% head
## 1 2 3 4 5 6
## 2.599214 2.705529 2.811843 2.918157 3.024471 3.130786
See exercises.
For this exercise, we’ll use the sleep study data from the
lme4 package. The following describes it.
The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time (in milliseconds) on a series of tests given each day to each subject.
After loading the package, the data can be loaded as follows. I show the first few observations.
library(lme4)
data("sleepstudy")
head(sleepstudy)
## Reaction Days Subject
## 1 249.5600 0 308
## 2 258.7047 1 308
## 3 250.8006 2 308
## 4 321.4398 3 308
## 5 356.8519 4 308
## 6 414.6901 5 308
Run a regression with Reaction as the target variable and Days as the predictor.
Run a mixed model with a random intercept for Subject.
Interpret the variance components and fixed effects.
What would a plot of the prediction lines per student look like relative to the overall trend?
Rerun the mixed model with the GPA data adding the cluster level
covariate of sex, or high school GPA
(highgpa), or both. Interpret all aspects of the
results.
gpa_mixed_cluster_level = lmer(?, gpa)
summary(gpa_mixed_cluster_level)
What happened to the student variance after adding cluster level covariates to the model?
The following represents a simple way to simulate a random intercepts model. Note each object what each object is, and make sure the code make sense to you. Then run it.
set.seed(1234) # this will allow you to exactly duplicate your result
Ngroups = 100
NperGroup = 3
N = Ngroups * NperGroup
groups = factor(rep(1:Ngroups, each = NperGroup))
u = rnorm(Ngroups, sd = .5)
e = rnorm(N, sd = .25)
x = rnorm(N)
y = 2 + .5 * x + u[groups] + e
d = data.frame(x, y, groups)
Which of the above represent the fixed and random effects? Now run the following.
model = lmer(y ~ x + (1|groups), data=d)
summary(model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + (1 | groups)
## Data: d
##
## REML criterion at convergence: 276.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.28382 -0.54016 -0.03464 0.54466 2.46986
##
## Random effects:
## Groups Name Variance Std.Dev.
## groups (Intercept) 0.22908 0.4786
## Residual 0.06254 0.2501
## Number of obs: 300, groups: groups, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1.93788 0.05001 38.75
## x 0.50747 0.01752 28.97
##
## Correlation of Fixed Effects:
## (Intr)
## x 0.029
confint(model)
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 0.4108568 0.5568699
## .sigma 0.2268842 0.2760810
## (Intercept) 1.8394120 2.0363239
## x 0.4731324 0.5419848
library(ggplot2)
ggplot(aes(x, y), data=d) +
geom_point()
Do the results seem in keeping with what you expect?
In what follows we’ll change various aspects of the data, then rerun the model after each change, then summarize and get confidence intervals as before. For each note specifically at least one thing that changed in the results.
\[\frac{\textrm{random effect variance}}{\textrm{residual + random effect variance}}\]
In addition, create a density plot of the random effects as follows.
re = ranef(model)$groups
qplot(x=re, geom='density', xlim=c(-3,3))
Change the random effect variance/sd and/or the residual variance/sd and note your new estimate of the ICC, and plot the random effect as before.
Reset the values to the original. Change Ngroups to 50. What differences do you see in the confidence interval estimates?
Set the Ngroups back to 100. Now change NperGroup to 10, and note again the how the CI is different from the base condition.
Now for the underlying model. We can show it in a couple different ways. First we start with just a standard regression to get our bearings.
\[\mathrm{gpa} = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon\]
We have coefficients (\(b_*\)) for the intercept and the effect of time. The error (\(\epsilon\)) is assumed to be normally distributed with mean 0 and some standard deviation \(\sigma\).
\[\epsilon \sim \mathcal{N}(0, \sigma)\]
An alternate way to write the model which puts emphasis on the underlying data generating process for \(\mathrm{gpa}\) can be shown as follows.
\[\mathrm{gpa} \sim \mathcal{N}(\mu, \sigma)\] \[\mu = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathrm{occasion}\]
More technically, the GPA and \(\mu\) variables have an implicit subscript to denote each observation, but you can also think of it as a model for a single individual at a single time point.
Now we show one way of depicting a mixed model that includes a unique effect for each student. Consider the following model for a single student[^notation]. This shows that the student-specific effect, i.e. the deviation in GPA just for that student being who they are, can be seen as an additional source of variance.
\[\mathrm{gpa} = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + (\mathrm{effect}_{\mathrm{student}} + \epsilon)\]
We would (usually) assume the following for the student effects.
\[\mathrm{effect}_{\mathrm{student}} \sim \mathcal{N}(0, \tau)\]
So the student effects are random, and specifically they are normally distributed with mean of zero and some estimated standard deviation (\(\tau\)). In other words, conceptually, the only difference between this mixed model and a standard regression is the student effect, which on average is no effect, but typically varies from student to student by some amount that is on average \(\tau\).
If we rearrange it, we can instead focus on model coefficients, rather than as an additional source of error.
\[\mathrm{gpa} = (b_{\mathrm{intercept}} + \mathrm{effect}_{\mathrm{student}}) + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon\] Or more succinctly:
\[\mathrm{gpa} = b_{\mathrm{int\_student}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon\]
In this way, we’ll have student-specific intercepts, as each person will have their own unique effect added to the overall intercept, resulting in a different intercept for each person.
\[b_{\mathrm{int\_student}} \sim \mathcal{N}(b_{\mathrm{intercept}}, \tau)\]
Now we see the intercepts as normally distributed with a mean of the overall intercept and some standard deviation. As such, this is often called a random intercepts model.
Another way of showing the mixed model is commonly seen in the multilevel modeling literature. It is shown more explicitly as a two part regression model, one at the observation level and one at the student level.
\[\mathrm{gpa} = b_{\mathrm{int\_student}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon\]
\[b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + \mathrm{effect}_{\mathrm{student}}\]
However, after ‘plugging in’ the second level part to the first, it is identical to the previous.
Note how we don’t have a student-specific effect for occasion. In this context, occasion is said to be a fixed effect only, and there is no random component. This definitely does not have to be the case though, as we’ll see later.
It always helps to look before we leap, so let’s do so. Here we plot GPA vs. occasion (i.e. semester) to get a sense of the variability in starting points and trends.
All student paths are shown as faded paths, with a sample of 10 shown in bold. The overall trend, as estimated by the regression we’ll do later, is shown in red. Two things stand out. One is that students have a lot of variability when starting out. Secondly, while the general trend in GPA is upward over time as we’d expect, individual students may vary in that trajectory.
So let’s get started. First, we’ll look at the regression and only the time indicator as a covariate, which we’ll treat as numeric for simplicity. Note that I present a cleaner version of the summarized objects for the purposes of this document.
load('data/gpa.RData')
gpa_lm = lm(gpa ~ occasion, data = gpa)
## summary(gpa_lm)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 2.599 | 0.018 | 145.7 | 0 |
| occasion | 0.106 | 0.006 | 18.04 | 0 |
| Observations | Residual Std. Error | \(R^2\) | Adjusted \(R^2\) |
|---|---|---|---|
| 1200 | 0.3487 | 0.2136 | 0.2129 |
The above tells us that starting out, i.e. when occasion is zero, the average GPA, denoted by the intercept, is 2.6. In addition, as we move from semester to semester, we can expect GPA to increase by about 0.11 points. This would be fine except that we are ignoring the clustering. A side effect of doing so is that our standard errors are biased, and thus claims about statistical significance based on them would be off. More importantly however, is that we simply don’t get to explore the student effect, which would be of interest by itself.
An alternative approach we could take would be to run separate regressions for every student. However, there are many drawbacks to this- it’s not easily summarized when there are many groups, typically there would be very little data within each cluster to do so (as in this case), and the models are over-contextualized, meaning they ignore what students have in common. We’ll compare such an approach to the mixed model later.
Next we run a mixed model that will allow for a student specific
effect. Such a model is easily conducted in R, specifically with the
package lme4. In the following, the code will
look just like what you used for regression with lm, but with an additional component specifying the
group, i.e. student, effect. The (1|student) means that we
are allowing the intercept, represented by 1, to vary by
student. With the mixed model, we get the same results as the
regression, but as we’ll see we’ll have more to talk about.
library(lme4)
gpa_mixed = lmer(gpa ~ occasion + (1 | student), data = gpa)
## summary(gpa_mixed)
## Skipping install of 'mixedup' from a github remote, the SHA1 (f04aaea1) has not changed since last install.
## Use `force = TRUE` to force installation
| term | value | se | t | p_value | lower_2.5 | upper_97.5 |
|---|---|---|---|---|---|---|
| Intercept | 2.599 | 0.022 | 119.800 | 0 | 2.557 | 2.642 |
| occasion | 0.106 | 0.004 | 26.096 | 0 | 0.098 | 0.114 |
| group | effect | variance | sd |
|---|---|---|---|
| student | Intercept | 0.064 | 0.252 |
| Residual | NA | 0.058 | 0.241 |
First we see that the coefficients, i.e. or in this context they can be called the fixed effects, for the intercept and time are the same as we saw with the standard regression[^lmlmercoef], as would be their interpretation. The standard errors, on the other hand are different here, though in the end our conclusion would be the same as far as statistical significance goes. Note specifically that the standard error for the intercept has increased. Conceptually you can think about allowing random intercepts per person allows us to gain information about the individual, while recognizing the uncertainty with regard to the overall average that we were underestimating before[^sewithin].
While we have coefficients and standard errors, you might have noticed that lme4 does not provide p-values! There are several reasons for this, namely that with mixed models we are essentially dealing with different sample sizes, the \(N_c\) within clusters, which may vary from cluster to cluster (and even be a single observation!), and N total observations, which puts us in kind of a fuzzy situation with regard to reference distributions, denominator degrees of freedom, and how to approximate a ‘best’ solution. Other programs provide p-values automatically as if there is no issue, and without telling you which approach they use to calculate them (there are several). Furthermore, those approximations may be very poor in some scenarios, or make assumptions that may not be appropriate for the situation[^fuzzyp].
However, it’s more straightforward to get confidence intervals, and we can do so with lme4 as follows[^confint].
confint(gpa_mixed)
## Computing profile confidence intervals ...
| group | effect | variance | sd | sd_2.5 | sd_97.5 | var_prop |
|---|---|---|---|---|---|---|
| student | Intercept | 0.064 | 0.252 | 0.225 | 0.282 | 0.523 |
| Residual | NA | 0.058 | 0.241 | 0.231 | 0.252 | 0.477 |
One thing that’s new compared to the standard regression output is the estimated standard deviation/variance of the student effect (\(\tau\)/\(\tau^2\) in our formula depiction from before). This tells us how much, on average, GPA bounces around as we move from student to student. In other words, even after making a prediction based on time point, each student has their own unique deviation, and that value (in terms of the standard deviation) is the estimated average deviation across students. Note that scores move due to the student more than double what they move based on a semester change. This is an important interpretive aspect not available to us with a standard regression model.
Another way to interpret the variance output is to note percentage of the student variance out of the total, or 0.064 / 0.122 = 52%. In this setting, this value is also called the intraclass correlation, because it is also an estimate of the within cluster correlation, as we’ll see later.
After running the model, we can actually get estimates of the student effects[^blup]. I show two ways for the first five students, both as random effect and as random intercept (i.e. intercept + random effect).
ranef(gpa_mixed)$student %>% head(5)
# showing mixedup::extract_random_effects(gpa_mixed)
| group_var | effect | group | value | se | lower_2.5 | upper_97.5 |
|---|---|---|---|---|---|---|
| student | Intercept | 1 | -0.071 | 0.092 | -0.251 | 0.109 |
| student | Intercept | 2 | -0.216 | 0.092 | -0.395 | -0.036 |
| student | Intercept | 3 | 0.088 | 0.092 | -0.091 | 0.268 |
| student | Intercept | 4 | -0.187 | 0.092 | -0.366 | -0.007 |
| student | Intercept | 5 | 0.030 | 0.092 | -0.149 | 0.210 |
coef(gpa_mixed)$student %>% head(5)
| group_var | effect | group | value | se | lower_2.5 | upper_97.5 |
|---|---|---|---|---|---|---|
| student | Intercept | 1 | 2.528 | 0.095 | 2.343 | 2.713 |
| student | Intercept | 2 | 2.383 | 0.095 | 2.198 | 2.568 |
| student | Intercept | 3 | 2.687 | 0.095 | 2.502 | 2.872 |
| student | Intercept | 4 | 2.412 | 0.095 | 2.227 | 2.597 |
| student | Intercept | 5 | 2.629 | 0.095 | 2.444 | 2.814 |
Note that we did not allow occasion to vary, so it is a constant, i.e. fixed, effect for all students.
Often, we are keenly interested in these effects, and want some sense of uncertainty regarding them. With lme4 this typically would be done via bootstrapping, specifically with the bootMer function within lme4. However, for some users this may be a bit of a more complex undertaking. The merTools package provides an easier way to get this with the predictInterval function[^predinterval]. Or you can go straight to the plot of them.
library(merTools)
predictInterval(gpa_mixed) # for various model predictions, possibly with new data
REsim(gpa_mixed) # mean, median and sd of the random effect estimates
plotREsim(REsim(gpa_mixed)) # plot the interval estimates
The following plot is of the estimated random effects for each student and their interval estimate (a modified version of the plot produced by that last line of code[^mertoolsplotlabels]). Recall that the random effects are normally distributed with a mean of zero, shown by the horizontal line. Intervals that do not include zero are in bold. In this case, such students are relatively higher or lower starting ouit compared to a typical student.
##
## Adjuntando el paquete: 'visibly'
## The following object is masked from 'package:maditr':
##
## :=
Let’s now examine standard predictions vs. cluster-specific predictions. As with most R models, we can use the predict function on the model object.
predict(gpa_mixed, re.form=NA) %>% head()
## 1 2 3 4 5 6
## 2.599214 2.705529 2.811843 2.918157 3.024471 3.130786
In the above code we specified not to use the random effects
re.form=NA, and as such, our predictions for the
observations are pretty much what we’d get from the standard linear
model.
predict_no_re = predict(gpa_mixed, re.form=NA)
predict_lm = predict(gpa_lm)
But each person has their unique intercept, so let’s see how the predictions differ when we incorporate that information.
predict_with_re = predict(gpa_mixed)
Depending on the estimated student effect, students will start above or below the estimated intercept for all students. The following visualizes the unconditional prediction vs. the conditional prediction that incorporates the random intercept for the first two students.
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
We can see that the predictions from the mixed model are shifted because of having a different intercept. For these students, the shift reflects their relatively poorer start.
Note our depiction of a mixed model as a multilevel model.
\[\mathrm{gpa} = b_{\mathrm{int\_student}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon\]
\[b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + \mathrm{effect}_{\mathrm{student}}\] If we add student a student level covariate, e.g sex, to the model, we then have the following.
\[b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + b_{\mathrm{sex}}\cdot \mathrm{sex} + \mathrm{effect}_{\mathrm{student}}\]
Which, after plugging in, we still have the same model as before, just with an additional predictor.
\[\mathrm{gpa} = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathrm{occasion}+ b_{\mathrm{sex}}\cdot \mathrm{sex} + (\mathrm{effect}_{\mathrm{student}} + \epsilon)\]
So in the end, adding cluster level covariates doesn’t have any unusual effect on how we think about the model[^mlevel]. We simply add them to our set of predictor variables. Note also, that we can create cluster level covariates as group means or some other summary of the observation level variables. This is especially common when the clusters represent geographical units and observations are people. For example, we might have income as a person level covariate, and use the median to represent the overall wealth of the geographical region.
Mixed models allow for us to take into account observed structure in the data. If this were all it was used for, we would have more accurate inference relative to what would be had if we ignored that structure. However, we get much more! We better understand the sources of variability in the target variable. We also get group specific estimates of the parameters in the model, allowing us to understand exactly how the groups differ from one another. Furthermore, this in turn allows for group specific prediction, and thus much more accurate prediction, assuming there is appreciable variance due to the clustering. In short, there is much to be gained by mixed models, even in the simplest of settings.
For this exercise, we’ll use the sleep study data from the lme4 package. The following describes it.
The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time (in milliseconds) on a series of tests given each day to each subject.
After loading the package, the data can be loaded as follows. I show the first few observations.
head(sleepstudy) %>%
kable() %>%
kable_styling()
| Reaction | Days | Subject |
|---|---|---|
| 249.5600 | 0 | 308 |
| 258.7047 | 1 | 308 |
| 250.8006 | 2 | 308 |
| 321.4398 | 3 | 308 |
| 356.8519 | 4 | 308 |
| 414.6901 | 5 | 308 |
Run a regression with Reaction as the target variable and Days as the predictor.
Run a mixed model with a random intercept for Subject.
Interpret the variance components and fixed effects.
Rerun the mixed model with the GPA data
adding the cluster level covariate of sex, or high school
GPA (highgpa), or both. Interpret all aspects of the
results.
What happened to the student variance after adding cluster level covariates to the model?
The following represents a simple way to simulate a random intercepts model. Note each object what each object is, and make sure the code make sense to you. Then run it.
set.seed(1234) # this will allow you to exactly duplicate your result
Ngroups = 100
NperGroup = 3
N = Ngroups * NperGroup
groups = factor(rep(1:Ngroups, each = NperGroup))
u = rnorm(Ngroups, sd = .5)
e = rnorm(N, sd = .25)
x = rnorm(N)
y = 2 + .5 * x + u[groups] + e
d = data.frame(x, y, groups)
Which of the above represent the fixed and random effects? Now run the following.
model = lmer(y ~ x + (1|groups), data=d)
summary(model)
confint(model)
library(ggplot2)
ggplot(aes(x, y), data=d) +
geom_point()
Do the results seem in keeping with what you expect?
In what follows we’ll change various aspects of the data, then rerun the model after each change, then summarize and get confidence intervals as before. For each note specifically at least one thing that changed in the results.